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Inferring drug-drug interactions (DDIs) is an essential step in drug development and drug 
administration. Most computational inference methods focus on modeling drug pharmacokinetics, 
aiming at interactions that result from a common metabolizing enzyme (CYP) . Here, we introduce a 
novel prediction method, INDI (INferring Drug Interactions), allowing the inference of both 
pharmacokinetic, CYP-related DDIs (along with their associated CYPs) and pharmacodynamic, non- 
CYP associated ones. On cross validation, it obtains high specificity and sensitivity levels (AUC (area 
under the receiver-operating characteristic curve) >0.93). In application to the FDA adverse event 
reporting system, 53% of the drug events could potentially be connected to known (41%) or 
predicted (12%) DDIs. Additionally, INDI predicts the severity level of each DDI upon co- 
administration of the involved drugs, suggesting that severe interactions are abundant in the 
clinical practice. Examining regularly taken medications by hospitalized patients, 18% of the 
patients receive known or predicted severely interacting drugs and are hospitalized more 
frequently. Access to INDI and its predictions is provided via a web tool at http://www.cs.tau. 
ac.il/~bnet/software/INDI, facilitating the inference and exploration of drug interactions and 
providing important leads for physicians and pharmaceutical companies alike. 
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Introduction 

Simultaneous administration of two or more medications is by 
now a common practice, but may result in significant drug- 
drug interactions (DDIs) , altering medication effectiveness or 
even harming the patient (Manzi and Shannon, 2005) . A DDI is 
defined as any drug effect that is greater/less than expected in 
the presence of another drug (Crowther et al, 1997). As the 
number of approved drugs increases, the number of potential 
interactions between prescribed medications rapidly rises 
(Khouri et al, 2006). Moreover, certain patient groups (e.g., 
elderly patients (Bergendal et al, 1995; Juurlink et al, 2003; 
Doubova Dubova et al, 2007) or cancer patients (Riechelmann 
et al, 2007)) are typically administered numerous medications, 
exposing them to a high risk of multiple interactions. The 
increase in potential drug interactions renders the experi- 
mental discovery of those intractable, calling for large-scale 
computational prediction methods. 

DDIs are conventionally categorized into pharmacokinetic 
interactions, whereby a drug is affecting the processes by 
which another drug is absorbed, distributed, metabolized 
or excreted (Zhang et al, 2009), and pharmacodynamic 



interactions, where the effects of one drug are modified by 
the effect of another on its site of action or by affecting the 
same or cross-talking signaling pathways (Jonker et al, 2005; 
Imming et al, 2006). Most previous work concerns the 
prediction of pharmacokinetic DDIs. Due to the complex 
nature of the problem, those algorithms typically handle 
separately the absorption, distribution, metabolism and 
excretion of each compound, relying on different properties 
of the compound such as its chemical structure, permeability, 
solubility and polarity (Boobis et al, 2002). Subsequently, 
physiologically based pharmacokinetic modeling algorithms 
attempt to integrate these individual predictions into coherent 
and predictive models (Brown et al, 1997; Boobis et al, 2002). 

Within the pharmacokinetic processes, the metabolism part 
covers the largest, yet poorly understood aspect and conse- 
quently the most difficult to evaluate and predict (Boobis et al, 
2002). Most of the metabolism-related DDIs involve the 
Cytochrome P450 (CYP) enzyme superfamily (Wrighton and 
Stevens, 1992; Goshman et al, 1999; Ekins and Wrighton, 
2001). Several methods focus on predicting in-vivo affinities of 
drug-CYP interactions from in-vitro experiments (Hutzler 
et al, 2005; Fowler and Zhang, 2008; Jamei et al, 2009; 
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Obach, 2009; Zhou and Zhou, 2009), while others attempt at 
in-silico modeling of drug-CYP interactions (Hudelson et al, 
2008) (e.g., using the rate of metabolism of a drug in the 
presence of a CYP inhibitor (Kato et al, 2003)). The main 
shortcoming of these models is the need for tuning several 
pharmacokinetic parameters such as steric hindrance, lipo- 
philicity, distribution volume, renal clearance and enzyme 
degradation rates (Boobis et al, 2002; Kato et al, 2003; Obach 
et al, 2007) . A different approach is employed for predicting 
pharmacodynamic interactions, depending primarily on com- 
bining single drug in-vitro or in-vivo measurements of 
pharmacodynamic constants (Tallarida, 2001; Jonker et al, 
2005; Li et al, 2007). 

Additional prediction methods (that are not type specific) 
follow two distinct approaches. The first approach obtains 
in-vitro chemogenomic profile measurements of drug-per- 
turbed cellular systems. These approaches infer interactions 
based on coupled perturbations (Nelander et al, 2008) or 
similarity between these profiles (Jansen et al, 2009) . These 
methods were so far tested on limited cell types and validated 
only at small scale. The second approach mines potential DDIs 
from adverse drug reaction (ADR) reports (Tatonetti et al, 
2012a,b). However, the latter methods suffer from several 
limitations, including various biases in the ADRs such as 
under-reporting, duplicate reports or change in reporting 
methodologies over time (Rawlins, 1988; van der Heijden etal, 
2002; Bate and Evans, 2009), the necessity to pre-define drug 
classes, and the inability to handle novel and rarely used drugs 
for which no or limited reports exist (Tatonetti et al, 2012a). 

Here, we present a large-scale in-silico DDI prediction 
method: INferring Drug Interactions (INDI), handling both 
pharmacokinetic and pharmacodynamic DDIs and overcom- 
ing the caveats of previous methods. The algorithmic frame- 
work follows the pairwise inference scheme previously 
employed to drug-indication prediction (Gottlieb et al, 2011): 
Given a query drug pair, INDI computes its similarity to drug 
pairs that are known to interact, exploiting seven different 
drug-drug similarity measures. The similarity scores of each 
drug pair according to each similarity measure pairs allows 
INDI to determine the likelihood that the query drug pair 
interacts. We further extended this framework to predict 
interaction-specific traits. Specifically, INDI allows (i) recom- 
mending the type of action to take upon administration of the 
two drugs (contraindicate, generally avoid, adjust dosage or 
monitor) and (ii) inferring the CYP isoforms involved when 
the interaction is CYP-related. The latter may enable physi- 
cians to seek alternative medications involving different 
drug-CYP interactions, as well as considering patient-specific 
genetic polymorphisms related to certain CYP enzymes 
(Bertilsson, 1995; Goshman et al, 1999) (e.g., patients who 
are CYP2D6-deficient (mgelman-Sundberg, 2004)). 

Drug interactions are a major cause of concern in the clinical 
practice. To investigate the prevalence of interactions in 
regularly taken medications, we applied INDI to three sources 
associated with clinical usage of drugs. We find that known 
and predicted DDIs cover the majority of drug events reported 
in the FDA adverse event reporting system. We further 
demonstrate the prevalence of known DDIs as well as the 
frequency of our predictions in regularly taken (chronic) 
medications reported by hospitalized patients in Israel and in 



commonly administered drug combinations, demonstrating 
their association with frequency of admissions. 

To conclude, the novelty in this paper is threefold: (i) the 
application of our algorithmic framework allows, to the best of 
our knowledge, the first unbiased in-silico DDI prediction 
algorithm; (ii) the extension of the algorithm to predict 
interaction-associated traits; and (iii) the exploration of known 
and predicted DDIs in clinical usage. 

Results and discussion 
Assembly of a DDI gold standard 

We downloaded 74104 DDIs involving 1227 drugs from 
DrugBank (Wishart et al, 2008) (10 702 interactions) and from 
www.drugs.com (Drugsite Trust, Drugs.com) (70 099 interac- 
tions tagged as major or moderate DDIs). We distinguished 
between three types of drug interactions (Materials and 
methods): (i) CYP-related DDIs (CRDs), in which both drugs 
are known to be metabolized by the same cytochrome P450 
(CYP) enzyme and there is supporting evidence that the 
interaction is CYP-related (10106 interactions); (ii) potential 
CYP-related DDIs (PCRDs), in which both drugs are metabo- 
lized by the same CYP but there is no evidence that the 
interaction is CYP-related (18 261 interactions); and (iii) non- 
CYP-related DDIs, where no CYP is shared between the drugs 
[NCRDs, 45 737 interactions). We disregarded other metabo- 
lizing proteins that may be involved in drug interactions due to 
the low number of such interactions (Materials and methods) . 
An outline of the gold-standard assembly process is depicted in 
Figure 1A. 

We report in the Supplementary Material (Section 1) the 
correlations between the numbers of drug interactions, 
indications and side effects as well as the observed fluctua- 
tions in these numbers over a period of 50 years (1961-2010). 
Importantly, we found that the number of NCRDs is positively 
correlated with the number of indications of a drug. The fact 
that promiscuous drugs, indicated for multiple diseases, tend 
to have higher number of pharmacodynamic interactions, 
underscores the need for predicting such interactions. 
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Figure 1 Illustration of the gold-standard assembly (A) and a schematic layout 
of the validation strategies, predictions and clinical implications (B). 
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INDI — an algorithm for DDI prediction 

We designed a novel algorithm for INDI with two objectives in 
mind: (i) predicting both new CRDs and new NCRDs and (ii) 
developing a general strategy that allows predicting interac- 
tions for novel drugs for which no interaction information is 
currently available. 

Given a gold-standard set of known DDIs, INDI ranks 
additional drug pairs based on their similarity to the known 
DDIs. This similarity -based approach was shown to be 
successful in predicting drug targets (Perlman et al, 2011) 



and indications (Gottlieb et al, 2011). The algorithm works in 
three steps (Figure 2) : (i) construction of drug-drug similarity 
measures; (ii) constructing classification features based on 
these similarity measures; and (iii) application of the classifier 
to these features to predict new DDIs. To address the first 
objective, we used two distinct parts of our DDI gold 
standard — one consisting of CRDs only and the other consist- 
ing of NCRDs only. Overall, we collected 5039 CRDs, spanning 
352 drugs and 20 452 NCRDs, spanning 671 drugs for which all 
drug-drug similarity measures could be computed (Materials 
and methods) . 
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Figure 2 Illustration of INDI algorithm: computation of drug-drug similarity measures (e.g., (A); scoring query drug pairs according to their similarity to known DDIs (B); 
and integration of the similarities to classification features and subsequent interaction prediction (C). 
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Table I Performance of predicted interactions in cross validation 



Training set CRDs NCRDs 



# Drugs AUC # Drugs AUC 



All 352 0.93 ±0.003 671 0.96 ±6e~ 4 

Chemically dissimilar drugs 3 186 ±2 0.9 ±0.005 301 ± 3 0.95 ±0.004 

Drugs sharing no targets 3 - - 164 ±3 0.93 ±0.01 

Drugs with no ATC available 394 0.92 ±0.002 815 0.94 ±6e~ 4 



a The number of drugs slightly varies due to the removal process. 



For the first algorithmic step, we assembled seven drug- 
drug similarity measures including chemical similarity, ligand- 
based chemical similarity based on Reiser et al (2009), 
similarities based on registered and predicted side effects 
(Atias and Sharan, 2010; Kuhn et al, 2010), the Anatomical, 
Therapeutic and Chemical (ATC) classification system and 
three similarity measures constructed between drug targets, 
including sequence similarity, distance on a protein-protein 
interaction (PPI) network and Gene Ontology (GO) 
(Ashburner et al, 2000) semantic similarity (Materials and 
methods) . 

The second algorithmic step integrates the drug-drug 
similarities to construct classification features and subse- 
quently learns a classification rule, distinguishing between 
true and false DDIs (Materials and methods) . For each query 
drug pair, we constructed features expressing its similarity to 
the closest gold-standard DDI, by adapting the scoring scheme 
of Gottlieb et al (2011) (Materials and methods) . Each feature is 
based on a combination of two drug-drug similarity measures, 
resulting in 49 features overall. We then learned a logistic 
regression classifier that automatically weights the different 
features to yield a classification score. 

We further extended the basic prediction algorithm to 
exploit the 49 features for inferring additional information 
related to the predicted DDIs: (i) recommendations of the type 
of action to take upon simultaneous administration of the 
interacting drugs (i.e., contraindicate, generally avoid, type of 
dosage adjustment or monitor); and (ii) for CRDs, the CYP 
isoforms affecting them. 

We provide a web interface for querying predicted DDIs, 
their recommendations and their related CYPs, available at 
http://www.cs.tau.ac.il/~bnet/software/INDI. In order to 
extend our method to drugs for which not all seven similarity 
measures could be computed, we provide predictions also for 
drugs for which only the chemical structure and some drug 
targets are known (Materials and methods). Notably, the 
prediction performance remains similar also for this extended 
drug set (Table I). 



Performance evaluation 

We evaluated the performance of INDI in predicting DDIs 
using a total of 37 212 true DDIs (omitting the type of 
interaction, that is, pharmacodynamic or pharmacokinetic 
and including PCRDs) . A schematic layout of the performed 
validations, summary of the predictions and the clinical 
practice implications is displayed in Figure IB. 
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The application of INDI in a 10-fold cross validation setting 
(Materials and methods) yielded an area under the receiver- 
operating characteristic curve (AUC) of 0.97 ±4e~ 4 . Encour- 
aged by these results, we then retained the information 
regarding the interaction type, applying INDI to predict CRDs 
and NCRDs using known DDIs from each type separately. 
Evaluating these predictions using a 10-fold cross validation 
setting, we obtained similarly high AUC scores of 0.93 ± 0.003 
for CRDs and 0.96 ± 6e~ 4 for NCRDs (Table I). Using the best 
Fl-measure (harmonic mean of precision and recall) over the 
different classifier ranks, these AUCs correspond to a recall and 
precision values of 0.89 + 0.01 and 0.84 ±0.01, respectively, 
for the CRD predictions and 0.93 ± 0.004 and 0.88 ± 0.004, 
respectively, for the NCRD predictions. We note that no single 
similarity measure obtains these high AUCs: the highest AUC 
was obtained using the ligand chemical similarity (0.86 for 
CRDs and 0.88 for NCRDs; Supplementary Table SI), and 
removing each single similarity measure had a marginal effect 
on the overall AUC (<0.02). 

To confirm that INDFs performance is not biased by 
redundancy stemming from chemically similar drugs, we 
created a non-redundant drug set by filtering for similar drugs 
(Tanimoto coefficient ^0.5; Materials and methods). The 
application of INDI to this set resulted in only a minor 
performance reduction (Table I; Supplementary Figure SI). In 
order to verify that this result is not biased by the type of 
chemical similarity computation, we further excluded the 
ligand similarity measure to obtain similar, minor, perfor- 
mance degradation for both CRDs and NCRDs (AUC differ- 
enced. 01; Supplementary Figure SI). For NCRDs, we further 
verified that removing drugs sharing the same targets had a 
minor effect on performance (Table I) . 

As a second validation strategy, we predicted DDIs based on 
the smaller set of DrugBank DDIs and tested their intersection 
with the larger set of Drugsite Trust, Drugs.com interactions 
(excluding interactions appearing in DrugBank) . Reassuringly, 
10% of the predicted CRDs and 25% of the predicted NCRDs 
significantly overlapped the corresponding Drugsite Trust, 
Drugs.com DDIs (P = 0 for both) . For completeness, we carried 
out the same validation when training on the Drugsite Trust, 
Drugs.com set, constraining ourselves to the same set of drugs 
used when training on the DrugBank set. We obtained 
significant overlap of 32% of the predicted CRDs (P=2e~ 7 ) 
and 18% of the predicted NCRDs (P = 4e~ 4 ) with the 
corresponding DrugBank DDIs (Supplementary Table S4). 
Additionally, we observed a significant overlap with PCRDs 
(12% of predicted CRDs, P=5e~ 313 and 28% of predicted 
NCRDs, P= 0, see Supplementary Material, Section 2). 
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Finally, we wished to compare INDI to a layman approach. 
To this end, we devised a naive method that predicts an 
interaction between a pair of drugs based solely on the 
similarity between them (rather than using known DDIs as in 
INDI) . This naive approach proved to be a poor predictor for all 
similarity measures, resulting in AUCs <0.6 for both the CRD 
and NCRD sets (Supplementary Figure S5; Supplementary 
Material, Section 3) . 



Prediction of novel DDIs 

Next, we applied INDI to predict new interactions between 807 
drugs (Materials and methods). Our full set of predictions 
includes 14 698 predicted CRDs and 28 108 predicted NCRDs. 
This set includes known PCRDs, which were further exploited 
for validation purposes. Excluding those from the prediction 
set, we acquired 11445 and 18 601 novel CRDs and NCRDs, 
respectively (Supplementary Table S2). In all, 1781 of the 
predicted CRDs and 3208 of the predicted NCRDs involve 
drugs with no DDI information (novel drugs in our context) . 
Notably, we observed a high correlation between the numbers 
of predicted and known CRDs of a drug, and similarly for the 
NCRD predictions. The detailed analysis, as well as analyses of 
the trends in the number of predicted interactors per drug over 
a period of 50 years appears in the Supplementary Material 
(Section 1). 

To evaluate our predictions, we compared them to a 
recent collection of DDIs mined from FDA Adverse 
Event Reporting System (Tatonetti et al, 2012b). Even upon 
exclusion of known DDIs from our prediction set, we 
observed a significant overlap between the two lists, where 
39 and 33% of the CRD and NCRD predictions, respectively, 
overlapped the list of mined DDIs (P = 4e~ 95 and P = 4e~ 23 , 
respectively) . 

We further assessed the predictions in two ways. The first is 
based on the assumption that NCRDs tend to occur among 
drugs affecting the same tissues. We associated drugs with 
tissues via their indicated diseases or via tissue-specific 
expression of their targets (see Materials and methods for 
details of the tissue association scheme and the validation of 
the above assumption). Reassuringly, our predicted interac- 
tions involved drugs that were also significantly associated 
with the same tissues (P = 2e~ 34 for disease-based and 
P=7e~ 41 for gene-expression based drug-tissue associa- 
tions). A similar trend, albeit with a lower magnitude, was 
observed for the CRD predictions (Supplementary Material, 
Section 2). The second validation, assumes that NCRD drug 
pairs affect similar or cross-talking mechanisms of actions and, 
thus, may also treat disease pairs with shared molecular 
mechanisms, contributing to co-morbidity of those pairs. 
Indeed, predicted NCRDs were found to be enriched with 
the corresponding co-morbidity pairs (see Supplementary 
Material, Section 2 for details) . 



Analysis of interactions between drug classes 

We used our predictions to study which drug classes should 
not be administered concomitantly in general. We utilized the 
third level of the ATC drug classification system to compute 
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interactions between ATC classes, using only interactions 
involving severe recommendations (i.e., generally avoid or 
contraindicate, encompassing 8 % of the DDIs) . Figure 3 A and 
B display the interaction networks formed by CRDs and 
NCRDs, respectively (for clarity, we display only interactions 
between classes involving a significant fraction of each class' 
drugs, see Materials and methods). Most of the inter-class 
interactions in these figures are supported by both known and 
predicted interactions (solid lines). We identified two pre- 
dicted novel CRD-based and four NCRD-based inter-class 
interactions. We found that these novel inter-class interactions 
extend known interactions from other classes (see 
Supplementary Material, Section 4) . 



Predicting DDI recommendations and related 
CYPs 

In addition to predicting DDIs, we aimed to provide additional 
information characterizing them: (i) recommendations, 
describing the type of action to take upon administration of 
the drugs predicted to interact and (ii) CYPs influencing the 
predicted CRDs. In order to predict recommendations, we used 
the four major recommendations available for all Drugsite 
Trust, Drugs.com and some of the DrugBank interactions: 

(i) contraindicated (3% of the CRD and NCRD training set); 

(ii) generally avoid (9%); (iii) adjust dosage (2%); and (vi) 
monitor (84%). The remaining 2% could not be mapped to 
any of these recommendations (Materials and methods) . For 
the 'adjust dosage' recommendation, we provided finer- 
grained categorization by further partitioning the recommen- 
dation into four subcategories: (i) decrease dosage (53% 
of the CRD and NCRD training set with adjust dosage 
recommendation); (ii) increase dosage (9%); (iii) limit 
dosage (15%); or (iv) adjust dosage interval (17%). Additional 
5% DDIs had an unspecific adjust dosage recommendation. 
Recall that each predicted interaction (CRD or NCRD) exploits 
a set of 49 most similar known DDIs, computed by the DDI 
prediction algorithm. We formed four novel features describ- 
ing each DDI prediction by computing the frequency of each of 
the four recommendations in these 49 known DDIs, and 
trained a logistic regression classifier for each recommenda- 
tion separately using these four features (Materials and 
methods). 

On a 10-fold cross validation test, we obtained high AUC 
scores for each of the recommendation types (Table II). 
Predicting each recommendation type independently, we kept 
the most severe recommendation whenever more than one 
type was predicted for a drug pair. Overall, we predicted 
recommendations for over 97% of the predicted DDIs 
(Table II) . In order to validate the predicted recommendations, 
we exploited the fact that 21% of the predicted CRDs are 
known NCRDs or PCRDs (with associated recommendations) 
and 33% of the predicted NCRDs are known CRDs or PCRDs 
(with associated recommendations). We obtained significant 
agreements between the predicted and known recommenda- 
tions (see Supplementary Table S5 and Supplementary 
Material, Section 5). Restricting ourselves to those predicted 
DDIs with predicted adjust dosage recommendation, we used 
the same methodology to further predict one of the four 
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Figure 3 Networks of CRDs (A) and NCRDs (B) with severe recommendations between third level ATC classes. Black solid lines denote interactions supported by 
both known and predicted interactions, while red dashed lines denote interactions supported by predicted interactions only. Edge width corresponds to the average 
percentage of interacting drugs from each class and node size corresponds to the number of interactors (degree) of the class. The first letter of each ATC category 
denotes the top level, anatomical, class, including Alimentary tract and metabolism (A), Blood and blood forming organs (B), Cardiovascular system (C), Dermatologicals 
(D), Genito-urinary system and sex hormones (G), Systemic hormonal preparations, excluding sex hormones and insulins (H), Antiinfectives for systemic use (J), 
Antineoplastic and immunomodulating agents (L), Musculoskeletal system (M), Nervous system (N), Antiparasitic products, insecticides and repellents (P), Respiratory 
system (R) and Sensory organs (S). 



Table II Performance of predicted recommendations in cross validation 



Recommendation CRDs NCRDs 





% of predictions 


AUC 


% of predictions 


AUC 


Contraindicate 


1% 


0.9 ±0.01 


1% 


0.96 ±0.008 


Generally avoid 


5% 


0.91 ±0.007 


7% 


0.97 ±0.002 


Adjust dosage 


2% 


0.88 ±0.01 


1% 


0.97 ±0.005 


Decrease dosage 


27% a 


0.94 ±0.01 


43 % a 


0.98 ±0.01 


Increase dosage 


24 % a 


0.84 ±0.08 


2% a 


0.91 ±0.04 


Limit dosage 


49 % a 


0.96 ±0.005 


b 




Adjust dosage interval 


_b 




51 % a 


0.99 ±0.009 


Monitor 


89% 


0.93 ±0.003 


88% 


0.98±7e~ 4 



Percentage out of the adjust dosage recommendation. 
b No predictions due to insufficient training data. 



subtypes, providing such a subcategorization for 85 and 96% 
of the adjust dosage predicted CRDs and NCRDs, respectively 
(see Supplementary Material Section 5 and Supplementary 
Table S5 for further details and validation) . 

Using a similar methodology, we predicted which CYP 
enzymes may be the cause of the predicted CRDs — essential 
information in drug design (de Groot, 2006). Extracting the 
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CYP enzymes known to cause the interaction from each CRD 
description, resulted in seven CYP enzymes which appear 
prevalently in known CRDs (3 A4, 2D6, 2C9, 1 A2, 2C8, 2C1 and 
2B6; Materials and methods). Overall, we predicted related 
CYPs for 99% of the CRD predictions, 18% of which have 
more than one CYP (versus 14% of the known CRDs). The 
predictions were validated in a cross validation setting 
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obtaining high AUCs (see Supplementary Material, Section 5 
for details and an additional validation) . 



Analysis of the prevalence of known and predicted 
DDIs in clinical data 

We sought to analyze the prevalence of DDIs in three 
independent clinically related sources: (i) ADRs; (ii) chronic 
medications taken by hospitalized patients in Israel; and (in) 
common drug combinations. 

To analyze ADRs, we used a comprehensive set of over 
270 000 of them from the FDA Adverse Event Reporting System 
(Materials and methods) . We considered drug pairs for which 
the number of ADRs is significantly higher than would be 
expected by chance, resulting in 1988 drug pairs involving 
drugs included in our CRD and NCRD sets (Materials and 
methods). We found that 6% (P = 4e~ 27 ) of these drug pairs 
are known CRDs, 15% (P = e " 2S ) are known NCRDs and 20 % 
(P = e~ 131 ) are known PCRDs. Additionally, 4% (P= 0.004) of 
the drug pairs are novel predicted CRDs and 8% (P= 3e~ 19 ) 
are novel predicted NCRDs. Moreover, a significant portion of 
those drug pairs (4%, P=6e~ 20 ) exhibits severe recommen- 
dations (see Supplementary Material, Section 6 for further 
details) . 

Next, we investigated the frequency of DDIs in drug pairs 
taken on a regular basis (chronic drugs). We retrieved 
electronic medical records of patients hospitalized in internal 
medicine departments at the Rabin Medical Center, Israel 
over a period of 1 year (Materials and methods). We 
investigated the frequency of severe DDIs in chronic drugs 
reported to the medical staff by 9413 of these patients, each 
taking at least two medications at admission time. We found 
that 3, 8 and 7 % of the patients were taking at least two chronic 
medications with known severe CRDs, NCRDs or PCRDs, 
respectively. Additional 1% of the patients took chronic 
medications involving predicted severe CRDs and 3 % of the 
patients took chronic medications with predicted severe 
NCRDs. Expectedly, these percentages follow the random 
expectation. Overall, 19% of the patients took at least two 
chronic medications with known or predicted severe DDIs, 
with 12% taking more than one such combination. Next, we 
compared the admission frequency of three patient groups 
over the course of 1 year (removing patients who passed away 
during that period for an unbiased comparison): (i) patients 
taking medications with at least one known severe interaction 
among them (group A) ; (ii) patients taking medications with 
predicted severe interactions, but with no known severe 
interactions (group B) ; and (iii) patients without any severe 
interactions (known or predicted) (group C). Expectedly, 
patients from groups A and B take more drugs than those in 
group C (11.6+10 and 10.3 ±7.3 for groups A and B, 
respectively, versus only 6.3 + 5.9 for group C, P = e~ 165 and 
P=3e~ 24 , respectively). Interestingly, we found that the 
patients from groups A and B were hospitalized 2 ± 1.6 and 
2 + 1.4 times on average during the year, while patients from 
group C were hospitalized only 1.8 ±1.3 times on average 
(Wilcoxon ranked sum test P=2e~ 27 and P= e~ 10 , respec- 
tively) . We note that no significant age or gender differences 
were observed between both groups A or B and group C. 
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Additionally, no enrichment of the primary reason for 
hospitalization was observed (using the primary discharge 
ICD-9-CM code, two-sided Fisher exact test with false 
discovery rate of 0.01). We found that patients from group A 
tended to have richer medical history than group C patients, as 
expressed by the number of associated diagnosis and 
procedural ICD-9-CM codes prior to their hospitalization 
(number of past ICD codes = 3.5±28 versus 2.7±2.4, 
P=6e~ 16 ). However, no significant difference in medical 
history was observed between groups B (number of past 
ICD codes = 3±2.4) and C. In order to compensate for the 
effect of medical history richness on the number of hospita- 
lizations for group A patients, we further stratified these 
patients into those with: (i) scarce history (0-4 associated 
codes); (ii) moderate history (5-10 codes); and rich history 
(>10 codes). We found the same significant difference in 
number of hospitalizations between patients in group A and 
those in group C. Precisely, patients with both scarce history 
(P = e~ 12 ) and moderate history (P=0.003) displayed 
significant differences, while rich history patients showed no 
significant difference. 

One of the drug combination reported by these patients is a 
predicted CRDs (a predicted CYP3A4-related CRD), predicted 
to be contraindicated. This interaction involves Roxithromy- 
cin, a semi-synthetic macrolide antibiotic, very similar in 
composition, chemical structure and mechanism of action to 
Erythromycin (according to DrugBank) and Alfuzosin, an oc- 
adrenergic blocker used to treat benign prostatic hyperplasia. 
This prediction is not surprising, since the similar antibiotic, 
Erythromycin — a potent inhibitor of CYP3A4, is known to 
contraindicate Alfuzosin by significantly increasing the 
plasma concentrations and pharmacologic effects of Alfuzo- 
sin, primarily metabolized by this isoenzyme (according to 
Drugsite Trust, Drugs.com). Several relatively frequent inter- 
actions (38 patients) were predicted to induce a severe NCRD. 
One example of such a predicted interaction involves Atenolol, 
a cardioselective (3-adrenergic blocker and Methyldopa, an oc-2 
adrenergic agonist primarily used as antihypertensive agent. 
According to a third, independent, DDI database, ePocrates 
(www.epocrates.com), this combination has indeed an 
increased risk of severe rebound hypertension on a-2-agonist 
withdrawal (antagonistic effects, unopposed a-adrenergic 
stimulation) . 

Last, we explored combinations of drugs that are frequently 
administered concomitantly, listed in the Drug Combinations 
Database (DCDB) (Liu et al, 2010). DCDB holds approved or 
investigational drug combinations, specifying potential inter- 
actions between their components. We found that 12% of 
these drug combinations are known CRDs (including one 
contraindicated interaction) and 19% are known NCRDs 
(including three severe interactions). The use of such 
combinations implies that their benefit may, in some cases, 
outweigh the adverse effects caused by the induced interac- 
tions. Similarly, we found that 6% (9 pairs) of the drug 
combinations intersecting our drug sets are predicted CRDs 
and 13% (20 pairs) are predicted NCRDs. Three and 12 drug 
combinations, respectively, are novel predictions (i.e., not 
known PCRDs; Supplementary Table S3). 

Importantly, we predicted a recommendation of 'generally 
avoid' for two predicted NCRDs, requiring special attention. 
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The first prediction involves Fluorouracil and Methotrexate, 
both antineoplastic antimetabolites, a combination used in 
adjuvant treatment for early breast cancer (Bonadonna et al, 
1995). This combination was indeed found to be mutually 
antagonistic under some circumstances (Tattersall et al, 1973; 
Maugh, 1976) and maintaining a 24-h period between 
administrations of the two drugs was found to be preferable 
(Marsh et al, 1991) . The second, related potential severe NCRD 
drug combination, involves Methotrexate and Leucovorin, the 
last being an active metabolite of folic acid. According to the 
Drugsite Trust, Drugs.com website, co-administration of 
Methotrexate with other agents known to induce hepatotoxi- 
city may potentiate the risk of liver injury. Indeed, it is reported 
that the addition of Leucovorin to Fluorodeoxyuridine 
metabolite produces greater hepatotoxicity than when Fluor- 
odeoxyuridine used alone (King and Perry, 2001) . We highlight 
an additional example of a predicted CRD between Tolter- 
odine, an antimuscarinic drug used to treat urinary incon- 
tinence, and Ergonovine, an ergot alkaloid with uterine and 
vascular smooth muscle contractile properties in the 
Supplementary Material (Section 7) . 



Conclusions 

Our interaction prediction framework, INDI, leverages on the 
comprehensive set of known DDIs, as well as a set of basic 
drug properties such as chemical structure, drug targets and 
side effects. Unlike many DDI prediction methods, INDI does 
not require the collection of kinetic and dynamic parementers 
(Boobis et al, 2002; Obach et al, 2007). INDI attaines very 
high rates of specificity and sensitivity in cross validation 
(AUC = 0.93 and 0.96 for CRDs and NCRDs, respectively). 
Furthermore, our predictions attain significant coverage of 
independent drug interaction sources for both interaction 
types (P= 0), as well as significant fraction of an independent 
drug interaction types such as PCRDs (12 % of predicted CRDs, 
P=5e~ 313 and 28% of predicted NCRDs, P=0). As more 
experimental evidence accumulates and PCRDs are easily 
classified to pharmacodynamic and pharmacokinetic types, 
INDI may further improve its prediction power. 

A limitation of our method is that it does not take into 
consideration the method of administration of the drug (e.g., 
taken orally, injected or applied externally such as ointment or 
ophthalmic drops) . As a specific example, a corticosteroid may 
produce an interaction when taken orally while having no 
effect when applied to the skin or taken as an ophthalmic 
solution. As this information is crucial for determining 
whether an interaction will take place in clinical reality, we 
addressed this limitation by manually removing predicted 
interactions involving topically applied drugs, as the prob- 
ability of such interactions remains low. Nevertheless, this 
limitation should be taken into account by the physician in 
each case individually. 

We extended our prediction scheme to two additional traits: 
(i) predicting the relevant recommendation for a predicted 
DDI; and (ii) predicting the CYP enzymes associated with a 
CRD. While the former is essential for selecting the correct type 
of action to take upon co-administration of the drugs, the latter 
highlights suspected CYPs, allowing the consideration of 
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patient-specific genetic polymorphisms related to certain 
CYP enzymes (Bertilsson, 1995; Goshman et al, 1999) (e.g., 
patients who are CYP2D6-deficient (Ingelman-Sundberg, 
2004)). These extensions demonstrate the potential applic- 
ability of our approach for improving patient treatment and, 
ultimately, tailoring it to the patient's needs. We note that 
while we were able to predict the recommendation of the 
required type of dosage adjustment, we are limited in 
predicting the magnitude of the adjustment. 

Using known and predicted DDIs, we explored their 
implications in clinical data. Using adverse drug reports, we 
were able to show that known and predicted DDIs significantly 
cover the majority of the adverse pairs of drugs reported. 
Additionally, we found an abundance of known and predicted 
severe DDIs in chronically administered medications, avail- 
able in medical record data from hospitalized patients in Israel. 
These patients were hospitalized significantly more frequently 
than others. Finally, we detected two potentially severe DDIs in 
frequently used drug combinations. Our web tool, available 
in http://www.cs.tau.ac.il/~bnet/software/INDI, may aid 
physicians and researchers to exploit INDI's predictions in 
the clinical practice. 

We suggest that our predictions may be beneficial in three 
areas: (i) drug development, especially in post-marketing 
surveillance, aiding in verification of hazardous interactions; 
(ii) large-scale clinical trial design, addressing and assessing 
potentially hazardous drug combinations; and (iii) driving and 
directing in-vitro validation of potentially hazardous interac- 
tions for efficiency and cost reduction of large-scale biological 
experiments. As a final word of caution, we note that while our 
predictions were validated in-silico, they should be further 
tested experimentally in order to establish their clinical 
implications. A special consideration should be given for drug 
combination whose benefit may outweigh the predicted 
adverse effects between them. 



Materials and methods 
Data sets 

DDIs were extracted from DrugBank version 3 (Wishart et al, 2008) 
(freely available at www.drugbank.ca/downloads) and the http:// 
drugs.com website (excluding DDIs tagged as minor), updated by 
Cerner Multum™ 21 June 2011. The DDIs from Drugsite Trust, 
Drugs.com were downloaded by using the generic drug names 
appearing in DrugBank to reach their corresponding Drugsite Trust, 
Drugs.com DDI index pages. From each such index page, we selected 
only the interacting drugs tagged as a major or moderate interaction. 
We provide a blinded drug-ID set and the set of similarity measures in 
order to allow readers to repeat the cross validation analysis 
(Supplementary File SI). Associations between drugs and Cytochrome 
P450 (CYP) were downloaded from DrugBank (Wishart et al, 2008), 
the Flockharts Interaction Table (Flockhart, 2007) and SuperCYP 
database (Preissner et al, 2010). DDIs between drugs associated with 
the same CYP and the interaction description specifically reports a CYP 
cause for the interaction were considered CRDs, while those lacking 
specific evidence in the interaction description were considered 
PCRDs. 

Drug targets were obtained from DrugBank (Wishart etal, 2008), the 
DCDB (Liu et al, 2010), Matador (Gunther et al, 2008) and KEGG DRUG 
(Kanehisa et al, 2010) databases. Canonical simplified molecular input 
line entry specification (SMILES) (Weininger, 1988) of the drugs were 
extracted from DrugBank (Wishart et al, 2008) . Drug side effects were 
obtained from SIDER (Kuhn et al, 2010). Drug-disease associations 
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were assembled from multiple sources including DrugBank [Wishart 
et al, 2008), FDA drug labels in the DailyMed website (http:// 
dailymed.nlm.nih.gov) and from http://drugs.com exploiting the 
MetaMap tool (Aronson, 2001) to parse textual indications in the 
same manner described in Gottlieb et al (2011). Human PPIs were 
compiled from experimental and literature curated data (Xenarios et al, 
2002; Rual et al, 2005; Stelzl et al, 2005; Ewing et al, 2007; Stark et al, 
2011). Protein sequences and GO annotations (Ashburner et al, 2000) 
were downloaded from UniProt (Jain et al, 2009). Drug combinations 
were downloaded from the DCDB database (Liu et al, 2010). FDA 
Adverse Event Drugs were downloaded from the FDA Adverse Event 
Reporting System (AERS), reports available for the years 2004 through 
2011 (first quarter). Drug approval years were retrieved from 
Drugs@FDA database (http://www.accessdata.fda.gov/scripts/cder/ 
drugsatfda/) as of 19 October 2011. 



Drug-protein considerations 

There are three known types of interactions between drugs and CYPs 
(some drugs interact with a CYP in more than one way) : the drug is (i) 
metabolized by the CYP enzyme (substrates); (ii) inhibits the CYP 
activity (inhibitor); or (iii) increases the CYP activity (inducer) (Zhou, 
2008). Using information regarding each drug-CYP interaction type, 
we observed that CRDs spanned all possible drug-CYP combinations 
(e.g., a drug that is a substrate of a certain CYP interacts with a drug 
that is an inhibitor of the same CYP, two drugs that are substrates of the 
same CYP, etc.). We observed that 'simple' interactions, where each 
drug involves just one type of drug-CYP interaction comprised only 
11 % of the CRDs. We thus ignored this type of information. 

While CYPs are the major family related to drug metabolism, other 
proteins (e.g., uridine diphosphate glucuronyl transferase and 
P-glycoprotein) may be involved in drug metabolism. These proteins 
had a small number of exclusively related DDIs (23 DDIs overall, 
unrelated to CYPs) and thus were excluded from the analysis. 



Similarity measures 

We defined and computed seven drug-drug similarity measures. Three 
of the drug-drug similarities (5-7) are gene-related, based on drug 
targets downloaded from the DrugBank (Wishart et al, 2008), DCDB 
(Liu et al, 2010), Matador (Gunther et al, 2008) and KEGG DRUG 
(Kanehisa et al, 2010) databases. For drugs associated with more than 
one gene, maximal similarities between the associated genes were 
averaged (averaging over the contribution of each member in a drug 
pair for symmetry). In order to extend our method to drugs for which 
not all seven similarity measures could be computed, the web tool 
provides predictions for drugs lacking the ligand and annotation-based 
similarity measures (#2 and #4). All similarity measures were 
normalized to be in the range [0, 1] . 

We used the following drug-drug similarity measures: 

(1) Chemical-based: Canonical SMILES (Weininger, 1988) of the drug 
molecules were downloaded from DrugBank (Wishart et al, 2008) . 
Hashed fingerprints were computed using the Chemical Develop- 
ment Kit (CDK) with default parameters (Steinbeck et al, 2006). 
The similarity score between two drugs is computed based on their 
fingerprints according to the two-dimensional Tanimoto score 
(Tanimoto, 1957), which is equivalent to the Jaccard score 
(Jaccard, 1908) of their fingerprints, that is, the size of the 
intersection over the union when viewing each fingerprint as 
specifying a set of elements. 

(2) Ligand-based: The Similarity Ensemble Approach (SEA) (Reiser 
et al, 2007) relates protein receptors based on the chemical 2D 
similarity of the ligand sets modulating their function. Given a 
drug's canonical SMILES, the SEA search tool compares it against 
a compendium of ligand sets and computes E-values for those 
ligand sets. To compute a drug-drug similarity we queried drugs 
using their canonical SMILES on the SEA tool. To obtain robust 
results, we queried the drug against the two ligand databases 
provided in the tool (MDL Drug data report and WOMBAT (Olah 
et al, 2005)) and used two different methods to compute the drug 
fingerprint (Scitegic ECFP4 and Daylight), resulting in four lists of 



similar ligand sets. Unifying the four lists and filtering drug-ligand 
set pairs with E-values > 10 ~ 5 , we obtained a list of relevant 
protein-receptor families for each drug. Finally, the similarity 
between a pair of drugs was computed as the Jaccard score 
between the corresponding sets of receptor families. 

(3) Side-effect based: Drug side effects were obtained from SIDER 
(Kuhn et al, 2010), an online database containing drug side effects 
associations extracted from package inserts using text mining 
methods. We augmented this list by side effect predictions for 
drugs that are not included in SIDER based on their chemical 
properties (Atias and Sharan, 2010) . Following this latter work, we 
defined the similarity between drugs according to the Jaccard 
score between either their known side effects or top 13 predicted 
side effects in case they are unknown (following the number 
suggested by Atias and Sharan (2010)) . In order to avoid bias from 
side effects related to kidney failure, which is directly related to 
drug interactions (Rowland Yeo et al, 2011) , we removed such side 
effects from consideration including renal failure or insufficiency, 
tubular acidosis and papillary necrosis. 

(4) Annotation-based: We used the World Health Organization 
(WHO) ATC classification system (Skrbo et al, 2004). This 
hierarchical classification system categorizes drugs according to 
the organ or system on which they act, their therapeutic effect, and 
their chemical characteristics. ATC codes were obtained from 
DrugBank. To define a similarity between ATC terms we used the 
semantic similarity algorithm of (Resnik, 1999). This algorithm 
associates probabilities p(x) with all the nodes (i.e., ATC levels) x 
in the ATC hierarchy by computing the number of levels below x; it 
then calculates the similarity of two drugs as the maximum over all 
their common ancestors ATC level c of -log (p(c)). 

(5) Sequence-based: Based on a Smith-Waterman sequence alignment 
score (Smith et al, 1985) between the corresponding drug targets 
(proteins) . Following the normalization suggested in Bleakley and 
Yamanishi (2009), we divided the Smith-Waterman score by the 
geometric mean of the scores obtained from aligning each 
sequence against itself. 

(6) Closeness in a PPI network: The distances between each pair of 
drug targets were calculated using an all-pairs shortest paths 
algorithm on the human PPI network. Distances were transformed 
to similarity values using the formula described in Perlman et al 
(2011): 

S(p,p') = Ae- D{p - p ' ) (1) 

where S(p,p') is the computed similarity value between two 
proteins, D(p,p'J is the shortest path between these proteins in the 
PPI network and A was chosen according to Perlman et al (2011) to 
be 0.9 ■ e. Self-similarity was assigned a value of 1. 

(7) GO based: Semantic similarity scores between drug targets 
were calculated according to Resnik (1999), using the csbl.go R 
package (Ovaska et al, 2008) selecting the option to use all three 
ontologies. 



Combining measures to classification features 

The classification features that we used were constructed from 
association scores calculated on all possible pairs of drug-drug 
similarity measures, resulting in 49 features. For a given similarity 
measure pair (i.e., feature), the score of a given drug pair (d^, d 2 ) is 
calculated by considering the similarity of all known drug interactions 
to this pair. The computation is done as follows: First, for each known 
interaction (di, d 2 ') we compute the drug-drug similarities S(d u d/J 
and S(d 2 , d 2 ') (and symmetrically S(di, d 2 ) and S(d 2 , d/)). Next, we 
follow the method of Gottlieb et al (2011) to combine the two 
similarities to a single score by computing their geometric mean. 
Finally, the overall score is 

Score(di , d 2 ) = max d; ^ + dl A ^Js(d u d[) ■ S(d 2 ,d' 2 ) (2) 

For purposes of validation, self-similarities (in case (d 1; d 2 ) itself is a 
known interaction) are excluded. 
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Performance evaluation 

We used a 10-fold cross validation scheme to evaluate the accuracy of 
our prediction algorithm. The entire training set used for the cross 
validation included the 37 212 true interactions and a randomly 
generated set of drug pairs (not part of the positive set), the same size 
as the positive set in each training set. When training on each 
interaction type separately, the training set included 5039 (training on 
CRDs) or 20452 (training on NCRDs) true drug interactions and a 
randomly generated set of drug pairs (not part of the positive set), the 
same size as the positive set in each training set. To obtain robust AUC 
score estimates, we performed 10 independent cross validation runs, 
selecting a different negative set and different random partition of the 
training set to 10 parts in each; we then averaged the resulting AUC 
scores. Expectedly, taking a negative set of twice the size as the 
positive set had a negilible effect on the resulting AUC score (less than 
e~ 4 ). We used the MATLAB implementation of the logistic regression 
classifier (glmfit function with binomial distribution and logit linkage) 
for the prediction task. 

In order to test the effect of redundant drugs on prediction accuracy, 
we created non-redundant sets of drugs filtered by two different 
criteria: (i) chemical similarity above a Tanimoto coefficient ranging 
from 0.8 to 0.5 and (ii) target sharing. To this end, we iteratively 
selected the most similar pair and randomly removed one of the pair's 
drugs. We repeated this procedure 10 times for each similarity 
threshold to construct different non-redundant sets and averaged over 
the resulting AUC score (reporting also the AUC standard deviation) . 



Novel predictions and prediction assessment 

To predict novel interactions for drugs, we used a training set that 
included all the known DDIs and an equally sized, randomly generated 
set of drug pairs that are not known to interact. We applied the trained 
classifier to a set of all the remaining possible drug pairs, including 
drugs with no known DDIs. We repeated the analysis with another 
randomly picked negative set, distinct from the first one, to assign 
prediction scores also to the random negative set that we initially used 
for training. The negative set was comprised solely of drugs with 
known DDIs in order to ensure a more robust definition of negative 
interactions. Overall, we obtained classification scores for all 320182 
and 304 769 drug pairs in the CRD and NCRD training sets, 
respectively. Following Gottlieb et al (2011), we selected a cutoff for 
the ranked list of predictions according to the best Fl-measure 
obtained from the cross validation. 

In addition to the cross validation, we validated our predictions by 
training only on the DrugBank DDIs and comparing the resulting 
predictions to known interactions from Drugsite Trust, Drugs.com. 
Additional validations for NCRDs include testing their agreement with 
tissue-specific information on the drugs and with co-morbidity of their 
indicated diseases. For the tissue validation, we constructed a tissue- 
based drug-drug similarity measure. We initially verified that drugs 
interacting via known NCRDs are significantly more similar to each 
other based on shared tissues than non-interacting pairs of drugs, 
supporting our assumption (Wilcoxon ranked sum test, P= 3e~ 81 for 
disease-based and P=7e~ 45 for expression based drug-tissue 
associations; see Supplementary Material (Section 2) for details). 

All enrichment P-values were computed using a hypergeometric 
test and tissue specificity validation computed a Wilcoxon ranked 
sum test. 



Prediction of DDI recommendations and 
DDI-causing CYPs 

We extracted the set of four recommendations available in the 
descriptions of each DDI defined in the Drugsite Trust, Drugs.com 
source: Contraindicate, generally avoid, adjust dosage and monitor. 
We further searched for the following keywords, defined in decreasing 
rank of severity: contraindicate, avoid, adjust dose/dosage and 
monitor in the DrugBank descriptions, followed by manual curation. 
A DrugBank description containing more than one keyword was 
classified according to the most severe term. DDIs having no 



recommendation were classified as 'unknown'. We further divided 
the predicted DDIs with an adjust dosage recommendation prediction 
into four subgroups: (i) decrease dosage; (ii) increase dosage; (iii) limit 
dosage; or (iv) adjust dosage interval. 

Each predicted interaction (CRD or NCRD) exploits a set of 49 most 
similar known DDIs. We formed four novel features describing each 
drug pair by computing the frequency of each of the four recommen- 
dations in these 49 known DDIs. We further trained a logistic 
regression classifier for each recommendation separately using these 
four features (in fact, since the four features sum to one, we removed 
one of the redundant features — the 'monitor' frequency feature). 
Performing the cross validation for each recommendation type, we 
selected all the DDIs sharing the same recommendation as the positive 
set and another, equally sized, random set of negative examples, 
comprised of the DDIs from the remaining three recommendations (for 
the 'monitor' recommendation type, spanning 84% of the total DDIs, 
we used the 14% remaining DDIs with known recommendation as a 
negative set). Finally, we applied the same framework as in the DDI 
prediction task to predict the recommendation type for each of the CRD 
and NCRD predictions (including novel drugs), training separately on 
each recommendation, using the entire positive set and a negative set 
of equal size. We selected the best Fl-measure from each recommen- 
dation cross validation for choosing each recommendation prediction 
cutoff. If a DDI prediction was classified to more than one 
recommendation, we kept only the most severe one. Predicting adjust 
dosage subcategories, we applied the same procedure, constraining 
ourselves only to these predicted DDIs receiving an adjust dosage 
recommendation. If a DDI was classified to more than one 
subcategory, we retained the more general top level adjust dosage 
recommendation without a finer specification (including 1 5 and 4 % of 
the adjust dosage predicted CRDs and NCRDs, respectively). 

For the prediction of the CYP enzymes which are a possible cause of 
a CRD prediction, we extracted the name of the CYP enzymes involved 
in the gold-standard CRDs from the interaction description. We used 
the top seven CYP enzymes appearing in our training set, appearing in 
>10 DDIs each (we excluded CYP 3A3, 2E1 and 2A6, appearing in 
<4 DDIs each in our set). Following the same methodology as in the 
recommendation task, we calculated the frequency of each of the 
seven CYP enzymes across the 49 features describing a CRD 
prediction. Unlike the recommendation algorithm, known CRDs may 
have more than one CYP causing them, thus we normalized the 
frequency by the maximal CYP occurring and retained all seven 
features (not necessarily summing to one). 



Analysis of interactions between drug classes 

In order to focus on prevalent interactions between drug classes, we 
required that each class contains at least five drugs in our set. 
Furthermore, for visualization purpose, we filtered inter-class inter- 
actions that involved <25% or 40% of the drugs in each class for 
CRDs or NCRDs, respectively. Last, self-interactions were removed due 
to trivial additive interactions. 



Analysis of ADRs 

We downloaded the FDA AERS data files and extracted all events that 
include one or two drugs. We kept only drug pairs we could map both 
to DrugBank drugs, using the generic name, synonyms and brand 
names available in DrugBank, followed by a manual curation to 
identify additional inexact drug name matches, resulting in 31 841 
unique drug pairs, reported 272150 times overall. We next used a 
hypergeometric test to extract drug pairs that were reported more than 
would be expected based on the overall occurrences of each individual 
drug in these reports, and corrected for multiple hypothesis testing 
using a false discovery rate of 0.01. This resulted in 1988 drug pairs, 
147 of which contained drugs with no known DDIs. 

Analysis of electronic medical records 

We retrieved electronic medical records describing the medications 
taken on a regular basis reported by patients administered in either 
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internal or acute geriatric units at the Rabin Medical center, Israel 
between August 2010 and August 2011 . The patients were identified by 
a randomly generated patient id. The study was approved by the 
Helsinki Committee of the Rabin Medical Center. 



Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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